Exercise 2

In the first exercise you've gotten used to python by writing several functions that use numpy to manipulate one and two dimensional arrays. Let us now add more functions to our toolbox, and create some code that can actually do something useful.

1: BINARY CLASSIFIER: Your linearClassifier() function from last week was just a function that takes a 1-D vector x and projected it onto another 1-D space via the function $softmax(Wx+b)$. The reason we call this a classifier is that it takes an input space of “features” in $x$ and maps it to an output space of “classes” that has the same dimensions as $b$. An example, taken from http://cs231n.github.io/linear-classify/ , is to take the pixel values of an image (flattened into a vector $x$), and projecting it into three classes (cat, dog, ship). As you can see from the image, the highest score for the weights given is for a dog (this classifier needs to be trained)! The purpose of the $softmax()$ function is to turn the “score” into a probability. Try running your $softmax()$ function on the scores below, you should get [ 6.06373937e-233, 1.00000000e+000, 5.33322036e-164], which is basically [0, 1, 0].

With the function presented in this way, the problem of classification is reduced to finding the $W$ and $b$ that will give the correct answers for most images of a cat, dog or ship. This will require “training” the weights by giving our classifier a lot of examples and adjusting as we go. The most common form of “adjusting” is called gradient descent, and you'll explore how to do it in detail in the code below.

In order to learn how to do this ourselves, we're going create an even more basic classifier, a binary classifier. A binary classifier is a special case of the above linear classifier, which maps a feature vector $x$ to a single number. In this special case the weight matrix $\mathrm{W}$ is now just a vector. For now, we'll even leave out the bias term, so the binary classifier is just just now $softmax({\bf W}\cdot {\bf x})$, where the multiplication is a simple vector dot-product. However, we can't apply the $softmax()$ to get a probability, as that normalizes over a range of classes. We want to project a real number into a probability space between 1 and 0 (e.g., is the above image a cat or not). For this we can use the sigmoid function.

2: SIGMOID FUNCTION: Last exercise had us create a $softmax()$ function, which is the generalization of a logistic function, whose output is within the range [0,1] and has a characteristic "S" shape. Since the softmax() is designed to work over a vector of classes and not a single number, we'll need to add another function to our toolbox, the sigmoid function. Your first task is to write this function.


In [1]:
# Define your sigmoid function 
def sigmoid(x):
    # This is where you write your code!
    return vector

I will include my version later, but try it yourself first after looking at the definition.

3: CIFAR-10: For this week’s exercise, I will provide my code for a binary classifier, which you will apply to the CIFAR-10 image data set. The assignment consists of several steps:

A) read through the first section of the following notes from the Stanford course on machine-learning, which focuses on simple gradient descent and stochastic gradient descent. Then skip to section 5 which will talk about logistic regression.

B) Go to my Github page to download all the code needed to perform the classification. Put all the files into a working directory of folder on your computer. You will also need to get the CIFAR-10 data (the python version), and unpack it in the same working directory.

C) Read through the CIFAR-10 page to be sure you understand what the data looks like.

D) Read through the python code, making sure you understand all of the steps. The file '''binary_classifier.py''' is the one that performs the gradient descent, so be sure that you follow the mathematics, and compare to the lecture notes above.

E) Run the code (cifar_binary.py) to train the classifier, and see how it does. Then start to adjust the parameters (learning rate $\alpha$ and number of epochs in particular) and see how it changes the performance. Is there a maximum number of epochs where the classification starts to get worse? This happens when you overfit the training data, and the weights are not general enough to work on the new validation data set.

4: BEST PERFORMANCE: Copy and paste the parameters you found that got what you consider your best performance into a text file and post it on ICON.

5: ADVANCED: We left out the bias term in the classifier. The term can be included in the weight vector by adding one column to $\mathrm{W}$ and then adding a 1 to the feature vector $x$ (increasing its length by 1). Give it a try and see if it changes the performance…?